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THE BINARY GENETIC ALGORITHM 



To find a peak, an optimization algorithm searches for the maximum cost. 
This analogy leads to the problem of finding the highest point in Rocky 
Mountain National Park. A three dimensional plot of a portion of the park 
(pur search space) is shown in Figure 2.3 and a crude topographical map 
(128 X 128 points) with some of the highlights is shown in Figure 2.4. 
Locating the top of Long's Peak (14,255 ft above sea level) is the goal. 
Three other interesting features in the area include Storm Peak (13,326 ft), 
Mount Lady Washington (13,281 ft), and Chasm Lake (11,800 ft). Since 
there are many peaks in the area of interest, conventional optimization 
techniques have difficulty finding Long's Peak unless the starting point is 
in the immediate vicinity of the peak. In fact, all of the methods requiring 
a gradient of the cost.function won't work with discrete data. The genetic 
algorithm has no problem! 

2.2.1 Selecting the Parameters and the Cost Function 

A cost function generates an output from a set of input parameters (a 
chromosome). The cost function may be a mathematical function, an ex- 
periment, or a game. The object is to modify the output in some desirable 
fashion by finding the appropriate values for the input parameters. We do 
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Figure 2.3 Three-dimensional view of the cost surface with a view of Long's 
Peak. 
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Figure 2.4 Contour plot or topographical map of the cost surface around Long's 
Peak. 

this without thinking when filling a bathtub with water. The cost is the 
difference between the desired and actual temperatures of the water. The 
input parameters are how much the hot and cold spigots are turned. In this 
case, the cost function is the experimental results of feeling the resulting 
temperature. So we see that determining an appropriate cost function and 
deciding which parameters to use are intimately related. 

The genetic algorithm begins by defining a chromosome or an array of 
parameter values to be optimized. If the chromosome has N par parameters 
(an A^ r -dimensional optimization problem) given by p x , p 2 , . . . , PN par > th en 



the chromosome is written as an N par element array. 

chromosome = [p u P2*'P*,-- . >PN par ] 



(2.1) 



For instance, searching for the maximum elevation on a topographical 
map requires a cost function with input parameters of longitude (jc) and 
latitude (y) 



chromosome = [x,y] 



(2.2) 



where = 2. Each chromosome has a cost found by evaluating the cost 
function,/, at p u p 2 p Npar - 



cost = f {chromosome ) = fipu Pi, • ■ ■ ,p Npar ) 



(2.3) 
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Since we are trying to find the peak in Rocky Mountain National Park, the 
cost function is written as the negative of the elevation in order to put it 
into the form of a minimization algorithm: 

f(x,y) = -elevation (2.4) 

Often, the cost function is quite complicated, as in maximizing the gas 
mileage of a car. The user must decide which parameters of the problem 
are most important. Too many parameters bog down the genetic algorithm. 
Important parameters for optimizing the gas mileage might include size of 
the car, size of the engine, and weight of the materials. Other parameters, 
such as paint color and type of headlights, have little or no impact on the 
car gas mileage and should not be included. Sometimes the correct number 
and choice of parameters comes from experience or trial optimization runs. 
Other times we have an analytical cost function, with the parameter being 
the variables of the function. A cost function defined by /(w, x y y,z) = 
2x + 3y + z/100000 + ^/vv/9876 with all parameters lying between 1 and 
10 can be simplified to help the optimization algorithm. Since the w and z 
terms are extremely small in the region of interest, they can be discarded 
for most purposes. Thus, the four-dimensional cost function is adequately 
modeled with two parameters in the region of interest. 

Most optimization problems require constraints or parameter bounds. 
Allowing the weight of the car to go to zero or letting the car width be 
10 m are impractical parameter values. Unconstrained parameters can take 
any value. Constrained parameters come in three brands. First, hard limits 
in the form of >, <, ^, and ^ can be imposed on the parameters. When a 
parameter exceeds a bound, then it is set equal to that bound. If x has limits 
of 0 < x < 10, and the algorithm assigns x - 1 1, then x will be reassigned 
to the value of 10. Second, variables can be transformed into new variables 
that inherently include the constraints. If x has limits of 0 < x < 10, then 
x = 5 sin y + 5 is a transformation between the constrained variable x and 
the unconstrained variable y . Varying y for any value is the same as varying 
x within its bounds. This type of transformation changes a constrained 
optimization problem into an unconstrained optimization problem in a 
smooth manner. Finally, there may be a finite set of parameter values from 
which to choose, and all values lie within the region of interest. Such 
problems come in the form of selecting parts from a limited supply. 

Dependent parameters present special problems for optimization algo- 
rithms, because varying one parameter also changes the value of the other 
parameter. For example, size and weight of the car are dependent. Increas- 
ing the size of the car will most likely increase the weight as well (unless 
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cost(w, v) = $ + m 



(2.5) 
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follows: 

, $-$/<> ^ M-M lo 
cost — — + (2.6) 

%hi ~ $lo ■ M hi - Mi 0 

where the hi and lo subscripts indicate the maximum and minimum values, 
respectively. This cost lies between 0 and 2, with the lo and hi subscripts 
representing the low and high values of the parameters. If one portion of the 
cost is more important, then it can be appropriately weighted (0 < wt < 1): 

cost = ^l^ + (l-wo|^- (2.7) 

The genetic algorithm is sensitive to the parameter range and cost repre- 
sentation. Equation (2.7) allows the user excellent control over the desired 
outcome. Picking a good wt is not always easy though. 



2.2.2 Parameter Representation 

The binary genetic algorithm works with a finite (but usually extremely 
large) parameter space. This characteristic makes the genetic algorithm 
ideal for optimizing a cost that is due to parameters that can only assume a 
finite number of values. Common examples include choosing values from 
a list or parts from stock on hand. 

If a parameter is continuous, then it must be quantized. Quantizing a 
parameter or signal is an established art. First, the range is divided into 
equal quantization levels (Figure 2.6). Any value falling within one of the 
levels is set equal to the mid, high, or low value of that level. In general, 
setting the value to the mid value of the quantization level is best, because 
the largest error possible is half a level. Rounding the value to the low or 
high value of the level allows a maximum error equal to the quantization 
level. The mathematical formulas for the binary encoding and decoding of 
the nth parameter, p n , are given by 

encoding: 

Pnorm (2.8) 
Phi - Plo 

gene[m] = round |p TOmi - 2~ m - ^^n^[m]2^| (2.9) 
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Figure 2.6 Four continuous parameter values are graphed with the quantization 
levels shown. The corresponding gene or chromosome indicates the quantization 
level where the parameter value falls. Each chromosome corresponds to a low, 
mid, or high value in the quantization level. Normally, the parameter is assigned 
the mid value of the quantization level. 



decoding: 



'vgene 



p q «ant = Yl Sene[m]2- m + 2" (Af+1) 

m=l 

Qn = PquantiPhi ~ Plo) + Plo 



(2.10) 
(2.11) 



where 



Pnorm = normalized parameter, 0 < p norm < 1 
Pio = smallest parameter value 
p h i = highest parameter value 
gene[m] = binary version of p n 
round{*} = round to nearest integer 
N gen e = number of bits in the gene 
P quam = quantized version of p norm 
q n = quantized version of p n 

The genetic algorithm works with the binary encodings, but the cost 
function often requires continuous parameters. Whenever the cost function 
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is evaluated, the chromosome must first be decoded using equation (2.10). 
An example of a binary encoded chromosome that has N par parameters, 
each encoded with N gene = 10 bits, is . 



chromosome = 



n i looiooi ooi loiin l. 



gene! 



gene 2 



0000101001 

> v ' 

gene N 



Substituting each gene in this chromosome into equation (2.10) yields an 
array of the quantized version of the parameters. This chromosome has a 
total of N bits = N gene X - 10 X N par bits. 

As previously mentioned, the topographical map of Rocky Mountain 
National Park has 128 X 128 elevation points. If x and y are encoded in two 
genes, each with N gene = 7 bits, then there are 2 7 possible values for x and y. 
These values range from 40° 15' < y < 40° 16' and 105°37'30" > x 
105° 36'. Thus, a random chromosome may have the following N bits = 14- 
bit binary representation: 



chromosome — 



110001 1001 1001 



This chromosome translates into matrix coordinates of [99, 25] or longi- 
tude, latitude coordinates of [105°36'50", 40° 15'29.7"]. 



2.2.3 Initial Population 

The genetic algorithm starts with a large commune of chromosomes known 
as the initial population. This initial population has N ipop chromosomes and 
is an N ipop X N bits matrix filled with random ones and zeros generated from 

I POP = round{random(^ op ,^ /w )} 

where the function random(N ipop , N bits ) generates an N ipop X N bits matrix 
of uniform random numbers between zero and one. This type of function 
is available on all standard mathematical software. The function round{} 
rounds the numbers to the closest integer. Each row in the matrix is a 
chromosome. The chromosomes correspond to discrete values of longi- 
tude and latitude. Next, the parameters are passed to the cost function for 
evaluation. A large initial population provides the genetic algorithm with 
a nice sampling of the search space. Usually, not all the initial population 
matrix chromosomes make the cut for the iterative portion of the genetic 
algorithm. Table 2. 1 shows the initial population and their costs for the 
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